翻訳と辞書
Words near each other
・ Stochastic cooling
・ Stochastic differential equation
・ Stochastic diffusion search
・ Stochastic discount factor
・ Stochastic dominance
・ Stochastic drift
・ Stochastic electrodynamics
・ Stochastic Empirical Loading and Dilution Model
・ Stochastic equicontinuity
・ Stochastic Eulerian Lagrangian method
・ Stochastic forensics
・ Stochastic frontier analysis
・ Stochastic game
・ Stochastic geometry
・ Stochastic geometry models of wireless networks
Stochastic gradient descent
・ Stochastic grammar
・ Stochastic hill climbing
・ Stochastic interpretation
・ Stochastic investment model
・ Stochastic matrix
・ Stochastic measurement procedure
・ Stochastic modelling (insurance)
・ Stochastic Models
・ Stochastic multicriteria acceptability analysis
・ Stochastic neural analog reinforcement calculator
・ Stochastic neural network
・ Stochastic optimization
・ Stochastic ordering
・ Stochastic oscillator


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Stochastic gradient descent : ウィキペディア英語版
Stochastic gradient descent
Stochastic gradient descent is a gradient descent optimization method for minimizing an objective function that is written as a sum of differentiable functions.
== Background ==
(詳細はstatistical estimation and machine learning consider the problem of minimizing an objective function that has the form of a sum:
: Q(w) = \sum_^n Q_i(w),
where the parameter w^
* which minimizes Q(w) is to be estimated. Each summand function Q_i is typically associated with the i-th observation in the data set (used for training).
In classical statistics, sum-minimization problems arise in least squares and in maximum-likelihood estimation (for independent observations). The general class of estimators that arise as minimizers of sums are called M-estimators. However, in statistics, it has been long recognized that requiring even local minimization is too restrictive for some problems of maximum-likelihood estimation, as shown for example by Thomas Ferguson's example. Therefore, contemporary statistical theorists often consider stationary points of the likelihood function (or zeros of its derivative, the score function, and other estimating equations).
The sum-minimization problem also arises for empirical risk minimization: In this case, Q_i(w) is the value of the loss function at i-th example, and Q(w) is the empirical risk.
When used to minimize the above function, a standard (or "batch") gradient descent method would perform the following iterations :
: w := w - \eta \nabla Q(w) = w - \eta \sum_^n \nabla Q_i(w),
where \eta is a step size (sometimes called the ''learning rate'' in machine learning).
In many cases, the summand functions have a simple form that enables inexpensive evaluations of the sum-function and the sum gradient. For example, in statistics, one-parameter exponential families allow economical function-evaluations and gradient-evaluations.
However, in other cases, evaluating the sum-gradient may require expensive evaluations of the gradients from all summand functions. When the training set is enormous and no simple formulas exist, evaluating the sums of gradients becomes very expensive, because evaluating the gradient requires evaluating all the summand functions' gradients. To economize on the computational cost at every iteration, stochastic gradient descent samples a subset of summand functions at every step. This is very
effective in the case of large-scale machine learning problems.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Stochastic gradient descent」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.